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Abstract 



We show that two important problems that have applications in computational 
biology are ASP-complete, which implies that, given a solution to a problem, it is 

i— i NP-complete to decide if another solution exists. We show first that a variation of Be- 

tweenness, which is the underlying problem of questions related to radiation hybrid 
mapping, is ASP-complete. Subsequently, we use that result to show that Quartet 
Compatibility, a fundamental problem in phylogenetics that asks whether a set of 

,-0 quartets can be represented by a parent tree, is also ASP-complete. The latter result 

shows that Steel's Quartet Challenge, which asks whether a solution to Quartet 
Compatibility is unique, is coNP-complete. 

> 

1 Introduction 

Many biological problems focus on synthesizing data to yield new information. We focus on 
the complexity of two such problems. The first is motivated by radiation hybrid mapping (RH 
mapping) [10] which was developed to construct long range maps of mammalian chromosomes 
t-h (e.g. [U [8j E]). Roughly, RH mapping uses x-rays to break the DNA into fragments and 

j> gives the relative order of DNA markers on the fragments [10J. The underlying computational 

problem is to assemble these fragments into a single strand (i.e., a "linear order"). As Chor 
and Sudan [7] show, the assembly of these fragments can be modeled by the well-known 
decision problem Betweenness. Loosely speaking, this problem asks if there exists a total 
ordering over a set of elements that satisfies a set of constraints, each specifying one element 
to lie between two other elements (see Section M for a more detailed definition). Since this 
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classical problem is NP-complete [22], Chor and Sudan [7] developed a polynomial-time 
approximation. Their algorithm, by using a geometric approach, either returns that no 
betweenness ordering exists or returns such an ordering that satisfies at least one half of 
the given constraints. However, in the context of RH mappings, one usually knows the so- 
called 3' and 5' end of the DNA under consideration. Therefore, we consider a variation of 
Betweenness — called cBetweenness — whose instances do not only contain a collection 
of constraints, but also an explicit specification of the first and last DNA marker, and that 
asks whether or not there exists a total ordering whose first and last element coincide with 
the first and last DNA marker, respectively. In this paper, we are particularly interested 
in the following question related to cBetweenness: given a solution to an instance of 
cBetweenness, is there another solution? A positive answer to this question may imply 
that the sampling of relatively ordered DNA markers is not large enough to determine the 
correct ordering of the entire set of DNA markers. 

Our second question focuses on finding the optimal phylogenetic tree for a set of taxa 
(e.g. species). Under the most popular optimization criteria (maximum parsimony and 
maximum likelihood), it is NP-hard to find the optimal tree [131 121] • Despite the NP- 
hardness, several approaches to this problem exists, one of which finds a phylogenetic tree 
by splitting the problem into subproblems, solves the subproblems and then recombines the 
solutions to form a complete phylogenetic tree that represents all taxa under consideration 
[U IH [19] . To this end, quartets which are phylogenetic trees on 4 taxa are often used. Given 
4 taxa, there are three possible ways to arrange them: 




ab\cd ac\bd ad\bc 

Since the number of possible topologies grows exponentially with the number of taxa, it is 
easier to decide the arrangement or topology of each subset of 4 taxa than the topology 
for a large set of taxa. This leads to the question: how hard is it to build a tree from 
quartets? If Q denotes the set of all quartets of a phylogenetic tree T, then T is uniquely 
determined by Q and can be reconstructed in polynomial time [T2] • However, in most cases 
T is not given, and Q is often incomplete (i.e. there exists a set of 4 taxa for which no 
quartet is given) or elements in Q contradict each other. In such a case, it is NP-complete 
to decide whether there exists a phylogenetic tree on n taxa that displays Q [27J; that 
is, a phylogenetic tree that explains all the ancestral relationships given by the quartets, 
where n is the number of taxa over all elements in Q. Because of the NP-completeness 
of this latter problem — called Quartet Compatibility — algorithmic approaches are rare. 
Nevertheless, several attractive graph-theoretic characterizations of the problem exist [161 
[25] . While [25J approaches the problem by using a chordal-graph characterization on an 
underlying intersection graph, [16J establishes a so-called quartet graph and edge colorings 
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via this graph to decide whether or not there exists a phylogenetic tree that displays a given 
set of quartets. As a follow-up on this last question, Steel asks whether or not the following 
problem, called the Quartet Challenge, is NP-hard j2H]: given a set Q of quartets 
over n taxa and a phylogenetic tree T on n taxa that displays Q, is T the unique such 
tree that displays Ql Although the above-mentioned characterizations [TU [25] comprises 
several results on when a set of quartets is displayed by a unique phylogenetic tree, the 
computational complexity of the Quartet Challenge remains open. We note that if a 
set Q of quartets that does not contain any redundant information is displayed by a unique 
phylogenetic tree on n taxa, then the minimum size of Q is n — 3 [26, Corollary 6.3.10] while 
the current largest maximum size is 2n — 8 [11] . 

To investigate the computational complexity of problems for when a solution is given and 
one is interested in finding another solution, Yato and Seta [30] developed the framework 
of Another Solution Problems (ASP). Briefly, if a problem is ASP-complete, then 
given a solution to a problem, it is NP-complete to decide if a distinct solution exists. Many 
canonical problems are ASP-complete, such as several variations of satisfiability, as well as 
games like Sudoku [30J. 

In this paper, we show that, given a solution to an instance of cBetweenness or 
Quartet Compatibility, finding a second solution is ASP-complete. To show that cBe- 
tweenness is ASP-complete, we use a reduction from a variant of satisfiability, namely, 
Not-All-Equal-3SAT with Constants (see Section [2] for a detailed definition). Using 
this result, we establish a second reduction that subsequently shows that Quartet Com- 
patibility is also ASP-complete. As we will soon see, the ASP-completeness of Quartet 
Compatibility implies coNP-completeness of the Quartet Challenge. 

We note that while this paper was in preparation, a preprint was released [H] that 
addresses the complexity of the Quartet Challenge using different techniques than em- 
ployed here. 

This paper is organized as follows: Section [2] details background information from com- 
plexity theory and phylogenetics. Section [3] gives the two reduction results, and Section [4] 
contains some concluding remarks. 

2 Preliminaries 

This section gives an outline of the ASP-completeness concept and formally states the deci- 
sion problems that are needed for this paper. Preliminaries in the context of phylogenetics 
are given in the second part of this section. 

2.1 Computational complexity 

Notation and terminology introduced in this section follows [H] and [30], with the former 
being an excellent reference for general complexity results. 
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ASP-completeness. The notion of ASP- completeness was first published by Yato and 
Seta [30]. Their paper provides a theoretical framework to analyze the computational com- 
plexity of problems whose input contains, among others, a solution to a given problem 
instance, and the objective is to find a distinct solution to that instance or to return that 
no such solution exists. To this end, the authors use function problems whose answers can 
be more complex in contrast to decision problems that are always answered with either 'yes ' 
or 'no '. Formally, the complexity class FNP contains each function problem II that satisfies 
the following two conditions: 

(i) There exists a polynomial p such that the size of each solution to a given instance ip 
of II is bounded by p(ip). 

(ii) Given an instance ip of IT and a solution s, it can be decided in polynomial time if s is 
a solution to ip. 

Note that the complexity class FNP is a generalization of the class NP and that each function 
problem in FNP has an analogous decision problem. 

Now, let II and II' be two function problems. We say that a polynomial-time reduction 
/ from IT to II' is an ASP-reduction if for any instance ip of II there is a bijection from the 
solutions of ip to the solutions of f(ip), where f(ip) is an instance of II' that has been ob- 
tained under /. Note that, while each ASP-reduction is a so-called 'parsimonious reduction', 
the converse is not necessarily true. Parsimonious reductions have been introduced in the 
context of enumeration problems (for details, see [23J). Furthermore, a function problem II' 
is ASP-complete if and only if II' e FNP and there is an ASP-reduction from IT to IT for 
any function problem II G FNP. 

Remark. Throughout this paper, we prove that several problems are ASP-complete. Al- 
though these problems are stated as decision problems in the remainder of this section, it 
should be clear from the context which associated function problems we consider. More 
precisely, for a decision problem ILj, we consider the function problem II whose instance 
ip consists of the same parameters as an instance of 11^ and, additionally, of a solution to 
if), and whose question is to find a distinct solution that fulfills all conditions given in the 
question of rLj. Hence, if II is ASP-complete, then this implies that, unless P=NP, it is 
computationally hard to find a second solution to an instance of II. 

Satisfiability (SAT) problems. The satisfiability problem is a well-known problem 
in the study of computational complexity. In fact, it was the first problem shown to be 
NP-complete [HI E] . Before we can formally state the problem, we need some definitions. 
Let V = {x\, X2, ■ ■ ■ , x n } be a set of variables. A literal is either a variable Xi or its negation 
Xi, and a clause is a disjunction of literals. Now let C be a conjunction of clauses (for an 
example, consider the four clauses given in Figure [T]). A truth assignment for C assigns each 
literal to either true or false such that, for each % e {1,2, .. . , n}, Xi = true if and only if 
Xi = false. We say that a literal is satisfied (resp. falsified) if its truth value is true (resp. 
false). 
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C 3 



X\ V %2 V X3 

11 V 13 V X4 
X\ V £3 V X4 

1 2 V x 3 V x 4 



do : {^l, %3, x a} -> true 

<7i : {xi, X2} — > true; <7i : {x 3 , £4} — > false 

02 : {xi, X3, X4} — > true; o 2 : {x 2 } —> false 



Figure 1: Left: An example of 4 clauses on the variables x\,... ,£4. Right: Truth assignments 
to the non- negated literals. As an instance of 3 SAT, there are several possible satisfying truth 
assignments including <to, 01, and <r 2 . When viewed as an instance of NAE-3SAT, gq is not 
satisfying since it assigns all literals of the first clause C\ to true, violating the 'not all equal' 
condition. 

Problem: SATISFIABILITY 

Instance: A set of variables V and a conjunction C of clauses over V. 
Question: Does there exist a truth assignment for C such that each clause 
contains at least one literal assigned to truel 

3S AT is a special case of the general SAT problem in which each clause of a given instance 
contains exactly three literals. Referring back to Figure [TJ all three truth assignments a , a%, 
and o"2 satisfy the four clauses for when regarded as an instance of 3SAT. The next theorem 
is due to [301 Theorem 3.5]. 

Theorem 2.1. 3SAT is ASP -complete. 

We will next show the ASP-completeness of another version of SAT that is similar to the 
following decision problem. 

Problem: Not-All Equal-3SAT (NAE-3SAT) 

Instance: A set of variables V and a conjunction C of 3-literal clauses over V. 
Question: Is there a truth assignment such that for each clause there is a literal 
satisfied and a literal falsified by the assignment? 

As an example, see Figure [TJ and note that o"o does not satisfy the four clauses when regarded 
as an instance of NAE-3SAT. It is an immediate consequence of the definition of NAE-3SAT 
that, given a solution S to an instance, a second solution to this instance can be calculated 
in polynomial time by taking the complement of S; that is assigning each literal to true 
(resp. false) if it is assigned to false (resp. true) in S (see [2T]). 

The next decision problem can be obtained from NAE-3SAT by allowing for instances 
that contain the constants T or F. 

Problem: Not-All Equal-3SAT with constants (CNAE-3SAT) 
Instance: A set of variables V, constants T and F, and a conjunction C of 
3-literal clauses over V U {T, F}. 

Question: Is there a truth assignment such that the constants T and F are 
assigned true and false, respectively, and for each clause, there is a literal or 
constant satisfied and a literal or constant falsified by the assignment? 
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In the case of CNAE-3SAT, we cannot always obtain a second solution from a first one 
by taking its complement. For instance, if an instance of CNAE-3SAT contains the clause 
flfc V6fcVT, then the assignment a k = b k = false is valid, while the complementary assignment 
flfc = bk = true is not. In fact, we next show that CNAE-3SAT is ASP-complete. 

Theorem 2.2. CNAE-3SAT is ASP-complete. 

Proof. Regarding CNAE-3SAT as a function problem (see the remark earlier in this section), 
it is easily seen that deciding if a truth assignment to an instance of CNAE-3SAT satisfies 
this instance takes polynomial time. Hence, CNAE-3SAT is in FNP. Now, let ip be an 
instance of the APS-complete problem 3SAT over the variables V = {x%, . . . , x n }. To show 
that CNAE-3SAT is ASP-complete we reduce ip to an instance ip' of CNAE-3SAT over an 
expanded set of variables V — V U {x x+ \, . . . , x n+m }, where m is the number of clauses in 
ip. Let x n+ k be a new variable chosen for the clause (a*. V bk V c*.) of ip. We obtain ip' from 
ip by replacing each clause (a& V b k V c k ) with the following 4 clauses: 

(a k V b k V x n+k ) A [x n +k V c k V F) A (a fc V x n+fe V T) A (b k V x n+fc V T). 

Clearly, this reduction can be done in polynomial time. The size of ip' is polynomial in 
the size of ip, and a straightforward check shows that ip is satisfiable if and only if ip' is 
satisfiable. Furthermore, for each clause, x n+k is uniquely determined by the truth values of 
a k and b k , since the reduction makes x n+k equivalent to a k Vb k . Hence, each truth assignment 
that satisfies ip can be mapped to a unique valid truth assignment of ip'. Consequently, the 
converse; i.e. each truth assignment of if)' is mapped to a unique truth assignment of i/j, 
also holds. It now follows that the described reduction from 3SAT to CNAE-3SAT is an 
ASP-reduction, thereby completing the proof of this theorem. □ 

The Betweenness problem. The decision problem Betweenness, that we introduce 
next, asks whether or not a given finite set can be totally ordered such that a collection of 
constraints which are given in form of triples is satisfied. 

Problem: BETWEENNESS 

Instance: A finite set A and a collection C of ordered triples (a, b, c) of distinct 
elements from A such that each element of A occurs in at least one triple from 
C. 

Question: Does there exist a betweenness ordering f of A for C\ that is a one- 
to-one function / : A — > {1, 2, . . . , \A\} such that for each triple (a,b,c) in C 
either f(a) < f(b) < /(c) or f(c) < f(b) < /(c)? 

Loosely speaking, for each triple (a, b, c), the element b lies between a and c in a betweenness 
ordering of A for C. Betweenness has been shown to be NP-complete [22J. Similar to 
NAE-3SAT, notice that if there is a solutions, say ai < a 2 < . . . < a s , to an instance of 
Betweenness, then there is always a second solution a s < . . . < a 2 < a\ to that instance 
that can clearly be calculated in polynomial time. Therefore, Betweenness is not ASP- 
complete. 
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We next introduce a natural variant of Betweenness — called cBetweenness — that 



is ASP-complete (see Section 3.1). An instance of cBetweenness differs from an instance 
of Betweenness in a way that the former contains two constants, say m and M, and each 
betweenness ordering has m as its first and M as its last element. We say that m is the 
minimum and M the maximum of each betweenness ordering. 

Problem: CBETWEENNESS 

Instance: A finite set A and a collection C of ordered triples (a, b, c) of distinct 
elements from A U {M, m} with m,M ^ A such that each element of A U {M, m} 
occurs in at least one triple from C . 

Question: Does there exist a betweenness ordering / of A U {M, m} for C such 
that / is a one-to-one function / : A U {m, M} — > {0, 1, 2, . . . , |A| + 1} such that 
for each triple (a,b,c) in C either f(a) < f(b) < /(c) or /(c) < /(&) < /(c), and 
/(m) = and f(M) = \A\ + 1? 

Although an instance ip of cBetweenness can have several betweenness orderings, note 
that if ao < d\ < (i2 < ■ ■ ■ < o>\a\+i is a betweenness ordering for ip with clq = m and 
a\A\+i — M, then a\A\+i < ■ ■ ■ < cl 2 < ai < a is not such an ordering. This is because m 
must be the minimal element, and M the maximal. 



2.2 Phylogenetics 

This section provides preliminaries in the context of phylogenetics. For a more thorough 
overview, we refer the interested reader to Semple and Steel [26J. 

Phylogenetic trees and subtrees. An unrooted phylogenetic X-tree T is a connected 
acyclic graph whose leaves are bijectively labeled with elements of X and have degree 1. 
Furthermore, T is binary if each non-leaf vertex has degree 3. The set X is the label set of 
T and denoted by L(T). 

Now let T be an unrooted phylogenetic X-tree, and let X' be a subset of X. The 
minimal subtree of T that connects all elements in X' is denoted by T(X'). Furthermore, 
the restriction of T to X', denoted by T\X', is the phylogenetic tree obtained from T(X') 
by contracting degree-2 vertices. 

Throughout this paper, we will use the terms 'unrooted phylogenetic tree' and 'phyloge- 
netic tree' interchangeably. 

Quartets. A quartet is an unrooted binary phylogenetic tree with exactly four leaves. 
For example, let q be a quartet whose label set is {a, b, c, d}. We write ab\cd (or equiv- 
alently, cd\ab) if the path from a to b does not intersect the path from c to d. Similarly 
to the label set of a phylogenetic tree, L(q) denotes the label set of q, which is {a,b,c,d}. 
Now, let Q = {qi,q2, ■ ■ ■ ,q n } be a set of quartets. We write L(Q) to denote the union 
L(g 1 )ULfe)U...U%). 
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Compatibility. Let T be a phylogenetic tree whose leaf set is a superset of L(q). Then 
T displays q if q is isomorphic to T\L(q). Furthermore, T displays a set Q of quartets if T 
displays each element of Q, in which case Q is said to be compatible. Lastly, <Q> denotes the 
set of all unrooted binary phylogenetic trees that display Q and whose label set is precisely 
L(Q). 

The concept of compatibility leads to the following decision problem, which has been 
shown to be NP-complete [27J. 

Problem: QUARTET COMPATIBILITY 
Instance: A set Q of quartets. 
Question: Is Q compatible? 

The next problem has originally been posed by Steel [2B] and is a natural extension of 
Quartet Compatibility. 

Problem: Quartet Challenge 

Instance: A binary phylogenetic A-tree T and a set Q of quartets on A such 
that T displays Q. 

Question: Is T the unique phylogenetic A-tree that displays Ql 

Remark. Given a binary phylogenetic A-tree T and a set Q of quartets on A such that 
T displays Q, deciding whether another solution exists is the complement question of the 
Quartet Challenge. That is, a no answer to an instance of the first question translates 
to a yes answer to the same instance of the Quartet Challenge and vice versa. 

We end this section by highlighting the relationship between the two problems Quartet 
Challenge and Quartet Compatibility. Let Q be an instance of Quartet Compat- 
ibility. If Q is compatible, then there exists an unrooted phylogenetic tree T with label 
set L(Q) that displays Q. This naturally leads to the question whether T is the unique 
such tree on L(Q), which is exactly the question of the Quartet Challenge. To make 
progress towards resolving the complexity of this challenge, we will first show that QUAR- 
TET Compatibility is ASP-complete. Thus, by [301 Theorem 3.4], the decision problem 
that corresponds to Quartet Compatibility, i.e. asking whether another solution exists, 
is NP-complete. Now, recalling the last remark, this in turn implies that the Quartet 
Challenge is coNP-complete. Lastly, note that if T is the unique tree with label set L(Q) 
that displays Q, then T is binary. 



3 ASP-Completeness Results 
3.1 ASP-Reduction for cBetweenness 

In this section, we focus on establishing an ASP-reduction from CNAE-3SAT to cBe- 



tweenness. This reduction gives that cBetweenness is ASP-complete (see Theorem 3.1 ) 



We note that, by a similar argument, we can also reduce NAE-3SAT to Betweenness. 
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Let ip be an instance of CNAE-3SAT consisting of a conjunction of 3-literal clauses 
{ctk V bk V Cfe : 1 < k < /}, on the set of variables {x\, . . . , x n } and a subset of the constants 
{T, F}. Let {— X\, —X2, • • • , — x n , x%, X2, ■ ■ ■ , x n } be the set of corresponding literals of ip, 
where — Xi (instead of Xi) denotes the negation of X{. We next build an instance ip' of 
cBetweenness. 

In the following, we think of an ordering as setting symbols on a line segment where 
the symbol X is the center; i.e. X represents '0'. Furthermore, we refer to the region 
left of X as the negative side and to the region right of X as the positive side of the line 
segment. For every variable Xi with i G {1,2, .. . ,n}, we preserve the two symbols Xi and 
— Xi and introduce two new symbols and —Mi which are auxiliary symbols that mark the 
midpoints between the Xi and — Xi symbols (see Figure [2]). Moreover, if T or F is contained 
in any clause of ip, then we also introduce the symbol M and m, respectively, such that M 
represents the largest and m the smallest value on the line segment (for details, see below). 
Intuitively, if the literal Xi is assigned to true, then the symbol Xi is assigned to the positive 
side and the symbol —Xi to the negative side. Otherwise, if the literal Xi is assigned to false, 
then the symbol Xi is assigned to the negative side and the symbol —Xi to the positive side. 

The following triples fix X as '0'. For every i < n, they put Xj and —Xi on either side of 
X. They also put Mi and —Mi on either side of X. 

(—Xi, X, x^ for all i such that 1 < i < n . , 

{-M h X, Mi) for all i such that 1 < % < n ^ 

The next set of triples put an order between the Xi and Mi symbols. They establish that 
\xi\ < \xi + \\ for every i, 1 < i < n, where we interpret | • | to be the distance to X (i.e. '0') 
under the induced ordering. Also, they fix Mj and —Mi as middle points between Xi (or 
— Xi), and (or —x^i) (see Figure [2]). 

(—Mi, Xi-\, Mi) (—Mi, —Xi-i, Mi) for all % such that 2 < i < n , . 

(— Mi, x^ (—Xi,—Mi,Xi) for alH such that 1 < i < n 

We will require that either both x^ and Mj are on the positive side, or on the negative side. 

(xi,X, -Mi) (-Xi, X, Mi) for all i such that 1 < i < n (3) 



Original Encoding: Given a clause (a^ V V c^) of ip, we assume for the remainder of 



Section 3.1 that G {— Xi,Xi}, bk G {— and Ck G {— Xi»,Xi»} such that % < i' < i" . 
Since there are at most two constants per clause, we also assume that is never a constant 
and that, if bk is a constant, then Ck is a constant. Lastly, we assume that the whole set of 
clauses of ip is ordered lexicographically. 

To guarantee the 'not-all-equal' condition, we use the original encoding of Opatrny [22J. 
For each clause V bk V Ck with k G {1,2, .. . ,1}, we add a new symbol, Zk, where / is 
the number of clauses in ip. We add the following triples that correspond to the original 
encoding of Opatrny [22] : 

(ak, Zk, bk) (ck, X, Zk) for all k such that 1 < k < I. (4) 
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X 4 *3 X 1 \ 

I H I I I H I I I 

m x 4 M 4 M 3 -x, -M 2 -x, -M, X M, x, M 2 x 2 -M 3 -x, -M 4 -x 4 M 




Figure 2: A mapping of the first set of clauses under the truth assignment o\ from Figure [TJ 
The top blue variables are the initial variables from the instance of CNAE-3SAT. For a given 
truth assignment, the ordering of the Xj and Mj symbols is fixed by the triples in Equation [TJ- 
|2j However, without additional triples (and auxiliary symbols), the Z\ symbols have overlapping 
ranges (as indicated by the lines with arrows) and, as such, several possible orderings for a single 
truth assignment. 



If a clause contains a constant F or T, it gets substituted by m or M, respectively, in the 
triple. For instance, a clause V bk V T generates the two triples (a^, Zk,bk), (T,X,Zf-). 
These triples force Zk to be on the negative side. 

The triples of Equation [4] say that at least one of the literals of the k th clause is assigned to 
true and one to false (see Figure [2]). When viewed in terms of the possible truth assignment 
to the variables in the original clause, the truth assignment = bk = Ck = true is eliminated 
since for all three initial literals to be assigned true, both Ck and Zk would be assigned to 
the positive side of X violating the second triple of Equation |4| By a similar argument, 
the truth assignment ak = bk = Ck = false is eliminated. This leaves six other possible 
satisfying truth assignments. 

The triples of Equation [T]j4] are not sufficient to prove ASP-completeness of cBetween- 
ness since different orderings of cBetweenness can be reduced to the same satisfying truth 
assignment of the CNAE-3SAT instance (see Figure [2|. This happens since the symbols in 
{Z\, Z2, . . . , Zi} are not fixed with respect to the X{ and — X{ symbols or to one another. In 
what follows, we will show how to fix the ordering of these symbols. 

Fixing Auxiliary Symbols in the Ordering of the Initial Symbols: We next intro- 
duce new symbols —Zi, . . . , —Zi, where I is the number of clauses in if). From now on, we 
will establish a set of triples for every clause. We assume that the triples up to the (k — l) th 
clause have been denned, and now we will give the set of triples of the k th clause. The 
intuition is that we want — Zk to be opposite to Zk with respect to X. We add: 

(-a k ,-Z k ,-b k ) (-c k ,X,-Zk) (5) 

To fix the position of Zk and — Zk for every 1 < k < I relative to the Xi and Mj symbols, 
we need more triples. Recall that a k G {— Xi,X{}. We add the following triples saying that 
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x , x 

, 1 , ,2 



I ' * 



M 3 -x 2 -M 2 -x, -M, X M, x, M 2 x 2 -M, 



Figure 3: Equations [6]{7] limit the ranges of the Z k symbols but do not completely fix the ordering. 
Continuing the example from Figure [TJ we show the possible ordering for the a\ truth assignment. 
Note that the order of Z\ and Z2 is not fixed and additional triples (and variables) are needed. 

\Z k \ < \M i+1 \. 

(-M i+1 ,Z k ,M i+l ) (-M i+1 ,-Z k ,M i+1 ) (6) 
The following triples say that \Mi\ < \Z k \. 

{-Z k ,Mi,Z k ) {-Z k ,-M h Z k ) (7) 

These triples eliminate the interval [—Mi, Mi] for the positions of Z k and — Z k , respec- 
tively. Since the Mj symbols occur as midpoints between the Xi symbols, we have restricted 
each Z k to be 'near' a k or — a k ; i.e. Z k is 'near' — Xj or X{ on the line segment. As a conse- 
quence of Equations [6j we have |Mj| < \Z k \ < |xj| or \xi\ < \Z k \ < |M i+1 |. At this point, we 
have the symbols Z k and — Z k in a tight interval between consecutive positions, except for 
the truth assignments a k = c k = false and b k = true, and a k = c k = true and b k = false. In 
these cases, both |Mj| < \Z k \ < \xi\ and \xi\ < \Z k \ < |M i+ i|, are still possible. See Figure [3] 
for an example. We therefore add: 

(Z k ,—a k ,Ck) {—Z k ,a k ,—c k ) (8) 

which fixes the Z k such that \xi\ < \Z k \ < |M i+1 |. 



Fixing the Order of Auxiliary Symbols Among Themselves: With the Z k and — Z k 

being fixed around Xi and — x i} respectively, and recalling that aj G {— Xi,Xi}, we need to 
address the case where the first literal of several clauses is either Xj or — Xj. Note that these 
corresponds to consecutive clauses of tp since ip is ordered lexicographically. In this case, we 
will use extra auxiliary symbols to nest the Z k symbols around Xi and — Xj. The intuition for 
these clauses is to partition the consecutive intervals into distinct regions for the different 
Z k and — Z k symbols (see Figure [1]). 

Let the k th clause be a k V b k \/ c k and assume that it is not the only clause starting by Xj 
or — Xj. We will add the symbols L k , —L k , U k , —U k to create 4 new subintervals (two around 
Xi and two around — Xj) where we will fix Z k and — Z k . Also, we will add the new symbols 
Y k to be the mirror image of Z k with respect to x» (or — Xj), and —Y k to be the mirror image 
of — Z k with respect to Xj (or — Xi). 
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The following triples express that the symbols Zk, Yk, L k and U k are all placed on the 
positive side, or in the negative side, and similarly for the negated symbols — Zk, —Yk, —L k 
and — Uk- 

(Z k ,X,—Y k ) (—Z k ,X,Y k ) 

(Z k ,X,—L k ) (—Z k ,X,L k ) (9) 
(Zk, X, — Uk) (—Zk,X,Uk) 

First, if the k th clause is the first clause in the ordering that starts by Xi or — Xi, we need 
the following triples saying that \xi\ < \U k \ and \L k \ < \xi\. 

(—Uk,Xi,Uk) (—Uk,—Xi,Uk) 

Second, if the k th clause is not the first one in the ordering that has X{ or X{ cis its 
first literal (i.e. a k -i,ak G {— Xi,Xi}), we proceed by introducing the following triples. By 
induction, we assume that for the previous (k — l) st clause, the subintervals were created 
using Lk-i and U k ~i with the corresponding Z k _\ and Yk-% in them. In this case, we need 
the triples saying that \Zk-\\ < \Uk\ and < \Uk\ (instead of the first two triples of 

Equation 10): 

(—Uk,Zk-i,Uk) {—Uk,—Zk-i,Uk) 
(—Uk,Yk-i,Uk) (—Uk,—Y k -i,Uk) 

and the triples saying that \Lk\ < \Zk-i\ and \Lk\ < \Yk-\\ (instead of the last two triples of 
Equation 10). 

( — Zk-i, Lk, Zk~i) (—Zk-i,—Lk,Zk-i) 
( — Yk-i, Lk, ^fc-i) (— Yk-i, ~ Lk, yjfc-i) 

At this point, we have Zk-i either between two consecutive L symbols, or between two 
consecutive U symbols. The same is true for — Zk~\, Yk-\ an d — Yk-\- 

Now, we need triples to locate Zk, —Z^, Yk and —Y^ in the newly created intervals. We 
add triples saying that L k and U k are between Y k and Z kl and similarly for the negated 
symbols. 

(Z k ,U k ,Y k ) (Z k ,L k ,Y k ) 

(— Zk, — Uk, — Yk) (—Z k ,—L k ,—Y k ) 

Since the L and U symbols are nested around Xi and — Xi, this puts the Y k and Z k symbols 
in their corresponding region around Xj and —a;*. 

Finally, we need to keep the Z k , —Z k , Y k , and —Y k from being too far from X{ or — Xi and 
intruding in the region of x i+ i, —x i+ i, or —x^i. This was done already for the Z k and 
— Z k symbols in Equations [6] and [7| and now we will do it for the Y k and —Y k symbols. This 
is important in the case that the k th clause is the last one containing Xi or Xi £ls its first 
literal. To do this, we need additional clauses saying that |Mj| < \Y k \ and \Y k \ < \M i+ i\. 

(Y k ,M u -Y k ) {Yk,-M h -Yk) , , 

{M i+1 ,Y k ,-M i+1 ) (M i+l ,-Y k ,-M i+1 ) 1 ] 

Theorem 3.1. cBetweenness is ASP- complete. 
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Proof. Given a 3CNF instance of CNAE-3SAT {a k V b k V c k : 1 < k < I}, on the set 
of variables {x\, . . . ,x n } and a subset of the constants {T, F}, we can create an instance 
of cBetweenness by using Equations 1-14. We show how to build a bijection between 
satisfying truth assignments for the formula instance and betweenness orderings that are 
solution to the cBetweenness instance. 

To define the total betweenness orderings, we consider the line segment [— n — 1,72 + 1] 
and define a mapping <j) a for every truth assignment o of the variables x\, . . . , x n . While the 
domain of a is {x±, . . . , x n }, the domain of (f) a , S = dom((j) a ), is contained in 

{xi, ...,x n , -xt, . . ., -x n ,m, M,X,Mi, M n , -M 1 , -M n , 
Z\, ■ ■ ■ , Zi, —Zi, . . . , — Z h Yi, . . . , Yi, —Yi, . . . , —Yi, 
Li, ■ ■ ■ , Li, —Li, . . . , — L t , Ui, . . . ,Ui,—Ux, ■ ■ ■ ,—Ui}. 

The mapping <f> a of truth assignments to orderings can now be defined the following way. 
First as a general property of <f) a , let us say that CT (— x) = —<f) a (x), for every symbol x of 
the instance. If a(xi) = T, then 4> a (xi) = i (and x^ = —i), and otherwise <fi a (xi) = —i 
(and <ptj{— Xi) = i). At this point the symbols, x^ and — x^ get fixed in the interval [— n, n] on 
opposite sides of 0. Note that the symbol X represents in the ordering, that is, 4> a (X) = 0. 
This part of the definition of cj) a fulfills Equation [TJ Next, we put m below —n, and M above 
n (i.e. 4>a{m) = —n — 1 and CT (M) = n + 1), as is required for the definition of a betweenness 
ordering for CBETWEENNESS. 

Next, we define the ordering function for the Mj symbols. If CT (xj) > 0, then CT (Mj) = 
<J) a (x t ) - 1/2 and M-Mi) = <j>*(-Xi) + 1/2. If fcfa) < 0, then <f> a {Mi) = <f> a ( Xi ) + 1/2 and 
4> a (—Mi) = 4> a (—Xi) — 1/2. This definition fulfills Equations [2] and [3j 

Now, for every k, 1 < k < I, we have to fix the position of every Z k under the mapping 
<ft a . Recall that each clause a k V b k V c k is ordered from smaller to larger index. We begin 
with the case where only this clause begins with the variable represented by a k . Then, no 
additional auxiliary variables (i.e. Y k , L k , and U k ) were introduced, and only Z k has to be 
placed in the order. There are six cases based on possible truth values assigned to a k , b k , and 
Cfc (as noted above, a k = b k = c k = false and a k = b k = c k = true are no satisfying truth 
assignments for an instance of CNAE-3SAT. If a(a k ) = cr(cfc) and a(b k ) has the opposite 
value (i.e. the cases of a k = c k = false and b k = true, and a k = c k = true and b k = false), 
then Z k will be set around —cj) a (a k ), since the Equations [5] and |6j the fact that the index of 
b k is bigger than that of a k , and the fact that we have the triple (Z k ,X,c k ) of Equation [3] 
Note that the triples in Equation [7] fix the positions of <\) a {Z k ) to one side of — <\> a {a k ). For 
the remaining cases where o~{a k ) ^ c(c k ) we define (f> a to place Z k near 4> a (a k ), again by 
Equations [3j [5j and [6j The general definition is as follows: 

(4>a{ak) + \ if cr(a k ) ^ a(c k ) and a(b k ) = true 

4>a(a k ) -\ if cr(a k ) ^ (x(c fc ) and a(b k ) = false 

<f>a(-a k ) + | if cr(a k ) = cr(c k ) and a(b k ) = true 

<f>a(-a k ) - \ if a(a k ) = cr(c k ) and a(b k ) = false. 

If there is only one clause beginning with a given literal or its negation, then by the 
triples in Equations [T] to [9l all the symbols (as well as their negations) are fixed. 
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There might be a number of clauses with the same variable in the first position of the 
disjunction. The positions of these respective Z k s have to be fixed, as well as the auxiliary 



variables, L k , U k and Yfc's. By the triples in Equations [T2j [13] and [14] the auxiliary variables 
Lk and U k (and the negative ones) are nested around 4> a (o> k ) and 4> a (— a>k) forming intervals, 
and Z k and Y k are set inside intervals of consecutive L k s or consecutive U k s (see Figure [1]). 
We assume that this is the p th clause that begins with the same variable. Then, as above, 
the exact placement of these variables depends on a. Define <ft a as follows: 

• Case 1: <r(a k ) = false, a{b k ) = false, a(c k ) = true: 

(j) a (L k ) = CT (a fc ) + | <j>a{Uk) = 4>a(ak) ~ J x 

Mz k ) = M^) - MY k ) = M^) + %P 

• Case 2: <r(a k ) = false, a{b k ) = true, &(c k ) = false: 



V(Lfc) = -0 CT (Ofc) - £ 4>a(U k ) = -(pa(ak) + 

%{z k ) = -faiak) + %i MYk) = -M<*k) - 



Case 3: a(a k ) = false, a{b k ) = true, cr(c k ) = true: 

M L k) = 4>a{ak) + | 4>a{U k ) = CT (a fc ) - g 

<f> a {Z h ) = <p a (a k ) + 3fi <f> a (Y k ) = CT (a fc ) - 2f± 

Case 4: a(a k ) = true, a{b k ) = false, a(c k ) = false: 

(f) a (L k ) = CT (a fc ) - | 4>a(U k ) = 4> a (a k ) + ^ 

Mz k ) = M^) - MYk) = M^) + 

Case 5: a(a k ) = true, a{b k ) = false, a(c k ) = true: 

M L k) = -<Pa{ak) + | M U k) = ~M a k) ~ I 

Mz k ) = -M<*k) - 2j jt MY k ) = -M^k) + 2j ^r 

Case 6: a(a k ) = true, a{b k ) = true, a(c k ) = false: 



^ a (L k ) = <p a (a k ) - ^ <pa(U k ) = <p a (a k ) + 

Mz k ) = M^k) + ^ MYk) = M^k) - 



p_ 



For each of these cases, it can be easily checked that the auxiliary variables satisfy Equa- 
tions nm 

We have shown that given a satisfying truth assignment o, there exists a mapping, 
(pa : S — > \—n — l,n + 1]. This mapping induces a natural ordering on S that we will call 
f a : S^[0,\S\ + I}. For Sl ,s 2 eS, 

fa{Sl) < fa(s 2 ) 0a(Sl) < CT (s 2 ). 
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Figure 4: Equations [9 14 fix the locations of the Z symbols, as well as the auxiliary variables 
uniquely in the order. Above illustrates the location of these auxiliary symbols for the o\ truth 
assignment from Figure [TJ 

Note that this fixes the minimal and maximal elements, m and M so that /<x(?ti) = and 
f a (M) = \S\ + 1. Further, we note that f a satisfies Equations [T 14, since <p a satisfied them 



and has the same ordering. We note that, by construction, each satisfying truth assignment 
a, uniquely defines f a . This construction can be done in quadratic time in \S\. Since the 
number of auxiliary variables in 5* is bounded polynomially in n, we have a polynomial time 
reduction from CNAE-3SAT to cBetweenness. 

Now we need to show the converse. Given a betweenness ordering on 5* that satisfies all 
the triples obtained from an instance, we need to define an assignment that for every clause 
there is one literal satisfied and one literal falsified. We define the assignment as follows 
For all symbols Xi (resp. — xi) such that f(xi) > f(X) (resp. /(— xi) > f(X)), we assign x. 
(resp. —Xi) to true. Similarly, for all symbols Xi (resp. — xi) such that f(xi) < f(X) (resp 
/(— Xi) < f(X)), we assign Xj (resp. —xi) to false. Furthermore, let m be the constant F 
and let M be the constant T. Now, we have to see that the assignment obtained satisfies 
at least one literal, and falsifies at least one literal of every clause. Equation [5] ensures that 
for a given clause V bk V Ck, not all three literals can be to the right of X or to the left of 
X. Therefore, the assignment created from the ordering will set at least one literal to true 
and at least one literal to false. Lastly, we note that if two betweenness orderings on S, fi 
and f 2 , yield identical truth assignments on {xi, . . . ,x n }, then, by Equations [T]|4j fi and f 2 
agree on the ordering of {xi, . . . , x n , —x±, . . . , — x n , m, M, X}. Further, Equations [5 14 fix 



the remainder auxiliary variables, and as such, we must have fi = f 2 . □ 



3.2 The Quartet Challenge is coNP-complete 

In this section, we show that the Quartet Challenge is coNP-complete. To this end, 
we extend the original argument of Steel [271 Theorem 1] that showed that the related ques- 
tion, Quartet Compatibility, is NP-complete. He reduced Betweenness to Quartet 
Compatibility by mapping a betweenness ordering to a caterpillar tree (for a definition, see 
below). Under that reduction, multiple solutions to an instance ip of the Quartet Compat- 
ibility problem could correspond to a single solution of the Betweenness instance that is 
obtained by transforming ip. To prove that the Quartet Challenge is coNP-complete, 
we extend Steel's polynomial-time reduction from Betweenness to an ASP-reduction from 
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Figure 5: The caterpillar axi\x2X 3 X4\x^p. 



cBetweenness to Quartet Compatibility. 

We first give some additional definitions. Let A be a finite set, and let C be a set of 
ordered triples of elements from A. Let (x, y) be a pair of elements of A such that no triple 
of C contains x and y. We say that (x, y) is a lost pair with regards to A and C. 

Let T be an unrooted phylogenetic tree. A pair of leaves (a, b) of T is called a cherry 
(or sibling pair) if a and 6 are leaves that are adjacent to a common vertex of T. Fur- 
thermore, T is a caterpillar if T is binary and has exactly two cherries. Following [27J, we 
say that T is an a (3 -caterpillar, if a and /3 are leaves of distinct cherries of T. We write 
axi\x 2 x 3 . . . x n _i\x n (3 to denote the caterpillar whose two cherries are (a,Xi) and (x n ,/3), 
and, for each i e {1,2, .. . , n}, the path from a to Xj consists of i + 1 edges. Now, let 
axi|x2X3 • • • x n _i|x n /3 be an a/3-caterpillar. For each i,j e {1, 2, . . . , n} with z 7^ j, we say 
that the path from Xi to Xj crosses Xk if and only if i < k < j or j < k < i. For example, 
Figure |5| shows an axi 1x2X3X4 |x 5 /3-caterpillar whose path from X\ to x 4 crosses x 2 and x 3 . 



Before we prove the main result of this section (Theorem 3.3), we need a lemma. 



Lemma 3.2. Let T be a phylogenetic tree, and let ab\cd be a quartet that is displayed by T. 
Then no element of {(a, c), (a,d), (b, c), (b, d)} is a cherry ofT. 

Proof. By the definition of a quartet, the path from a to b in T does not intersect the path 
from c to d in T. Thus, no element of {(a, c), (a, d), (6, c), (6, d)} is a cherry of T. □ 

Theorem 3.3. Quartet Compatibility is ASP-complete. 

Proof. We start by noting that it clearly takes polynomial time to decide whether or not 
a phylogenetic tree T displays a given set Q of quartets since it is sufficient to check if 
T\L(q) = q for each quartet q in Q. Hence, Quartet Compatibility is in FNP. 

To show that Quartet Compatibility is ASP-complete, we next describe an ASP- 
reduction from cBetweenness to Quartet Compatibility. Let ip be an instance of 
cBetweenness over a finite set A = {ai, a 2 , . . . , a s } U {m, M}. Let C — {711, 7r 2) . . . , vr„} 
be the set of triples of ip, with 7Tj = (6j,Cj, dj) for each i G {1, 2, . . . , n}, such that each 
element of A U {m, M} is contained in at least one triple. Recall that m is the first and M 
the last element of each betweenness ordering / of A U {m, M} for C; that is /(m) = and 
f(M) = \A\ + 1. For simplicity throughout this proof, let A' = A U {m, M}. Furthermore, 
let L = {r n+ i,T n+2 , . . . ,T n /} precisely be the set of all lost pairs with regards to A' and C, 
where r« = (x«, yi) for each i e {n + 1, n + 2, . . . , n'}. 
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We next describe six sets of quartets: 

1. Each triple 7Tj = {pi, q, di) in C is represented by 6 quartets in 

n n 

Qi = {JQtt, = \^{pip[\biCi,pibi\cidi,piCi\d^^ 
i=i i=i 

2. Each lost pair Tj = (xj, y,) in L is represented by 5 quartets in 

n' n' 
i=n+l i=n+l 

3. Let aj, cik G A', and let a be any fixed element of A. Set Q3, Q4, and Q$ to be the 
following: 

n' i-1 

= [J U{P^lft' a '^ibi a '®9il37' a '*9il^°}: 

i=2j=l 
n' n' 

Qa = IJ Ufe^i*^'^i^ a '^i^ a }' and 
i=i i=i 

^ 5 = U U U^^i a J afc '*^i a J afc ^ 

i=i i=2 fc=i 

4. Let ai,a,j G A, and set Qg to be the following: 

s i—l 

Qe = \^J \^J{am\aiaj , didjlM (3} . 

i=2 j=l 

Noting that n! is in the order of 0(|A| 3 ), the quartet set 

6 

Q = \jQi 

i=l 

can be constructed in polynomial time. 

We note that for Steel's original proof (23 Theorem 1], in which he describes a polynomial- 
time reduction from Betweenness to Quartet Compatibility in order to show that the 
latter decision problem is NP-complete, the construction of Q\ is sufficient. 

A straightforward analysis of the quartets in Q n . and Q Ti , respectively, shows that <Q 7T% > 
and <Q Ti > contain the following phylogenetic trees which are all a/3-caterpillars: 

<Q n > = {apilpfiiCidiqilqi^aqilq-diCibiPilp'iP} 
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for each i £ {1, 2, . . . , n} and 

<Qn> = {ap i \p' i x i y i q i \q' i /3,aqi\q' i yiX i p i \p' i f3} 



for each i £ {n + 1, n + 2, . . . , n'}. 



Let T be a phylogenetic tree of <Q>. By Qi and Q 2 , it is easily checked that either, if 
Pi and p\ are both crossed by the path from a to bi (resp. Xi) in T, then qi and g- are both 
crossed by the path from (3 to bi (resp. Xj) in T, or if g^ and q[ are both crossed by the path 
from a to bi (resp. Xj) in T, then pj and are both crossed by the path from /3 to bi (resp. 
Xi) in T for when i £ {1, 2, . . . ,n} (resp. i £ {n + l,n+2, . . . , n'}). We refer to this property 
of T as the desired pq-property for i. 



Let V be the set {p 1; . . . ,p n ',p[, ■ ■ ■ ,p' n >, Qi, ■ ■ ■ , Qn', Qi, <l'n>}, and let ^ be a phyloge- 
netic tree in <Q>. We continue with making several observations that will be important in 
what follows: 



1. The second and third quartet in Qi, the second quartet in Q 2 , and Lemma 3.2 imply 
that T does not have a cherry (a, b) with a, b £ A'. 



2. The quartets in Q 5 and Lemma imply that T does not have a cherry (a, b) with 
a £ A' and 6 £ V. 

3. The last two quartets in Qi and Q 2 , the quartets in Q3, the first quartet in Q4, and 



Lemma 3.2 imply that T does not have a cherry (a, 6) with a, b £ V. 



In conclusion, T is an a/3-caterpillar. This observation leads to a number of additional 
properties that are satisfied by T: 

(i) By Q 5 and the desired pg-property for each % £ {1,2, .. . , n'}, the subtree T(A') can 
be obtained from T by deleting exactly two of its edges. 

(ii) Let i £ {1,2, . . . ,n'}. Let P contain each element of ({pi,P2, • • • ,Pn'>pi>P2) • • • ;Pn'} — 
{pi,Pi}) that is crossed by the path from p, (and p£) to a in T for any a £ A'. Then, by 
Q3, each element in P has an index that is smaller than i. Analogously, let P' contain 
each element of ({qi, q%, ■ ■ ■ , q n ', q[, Q21 ■ ■ ■ > Qn'} ~ ^ n at is crossed by the path 
from qi (and gQ to a in T for any a £ A'. Then, again by Q3, each element in P' has 
an index that is smaller than %. 



(iii) Let a £ A'. By the second and third quartet of Q4 the path from (and g^) to a does 
not cross an element of {pi,P2, • • • ,Pn',p'i,P2) • • ■ ;Pn'} i n ^ f° r eacn ^ £ {1, 2, ... , n'}. 

(iv) By (i)-(iii), the path from pi (resp. g«) to p\ (resp. g^) in T consists of 3 edges for each 
i £ {1,2, .. . , n'}. In particular, by the last two quartets of Q\ and Q2, respectively, 
the path from a to p\ (resp. g-) crosses pi (resp. gj) and the path from (5 to p, (resp. 
gj) crosses p^ (resp. g^) in T. 
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Figure 6: An a/3-caterpillar in <Q> that satisfies properties (i)-(v) for an instance of cBe- 
tweenness that consists of the four triples tt\ = (m,a,b), 1T2 = (M,c,b), 113 = (c, a, to), and 
7T4 = (m,b,M). Note that the associated set of lost pairs only contains T5 = (a,M). Details on 



how to construct Q are given in in the proof of Theorem 3.3 



(v) By Qq, the path from m to M crosses each element in A. Furthermore, neither the 
path from a to m nor the path from f3 to M crosses an element of A. 

To illustrate, Figure [6] shows an a/3-caterpillar T of <Q> and, consequently, satisfies prop- 
erties (i)-(v) for an instance of cBETWEENNESS that contains the four triples -R\ = (m, a, b), 
712 = (M, c, b), 7T3 = (c, a, m), and tt^ = (m, b, M). Note that T(A') can be obtained from T 
by deleting the two edges e and e'. 

Now, let T be a phylogenetic tree in <Q>. Let X" be the phylogenetic tree obtained from 
T by interchanging a and /3, and for each i 6 {1,2, ... ,n'}, interchanging pi and and qi 
and g^. Noting that T' does display Q\Qq but does not display Q since property (v) is not 
satisfied, the rest of this proof essentially consists of two claims. 

Claim 1. Let T and V be two elements of <Q>. Then T^T if and only if T\(A'u{a, (3)) £ 
T'P'U {«,/?}). 

Trivially, if T = T', then in particular T|(A' U {a, /?}) = T'|(A' U {a, /?}). To prove the 
claim it is therefore sufficient to show that, if T £ T , then T\(A'U{a, /?}) £ J"|(A / U{a, /?}). 
Assume the contrary. Then there exist two distinct elements T and T' in <Q> such that 
T|(A' U {a, (3}) = T'\(A' U {a, /?}). Let a be an element of A'. Since both of T and T are 
a/3-caterpillars that satisfy properties (i)-(v), there exists an % G {1,2, . . . ,n'} such that the 
path from a to a crosses pi and p\ in one of T and T", say T, while the path from a to a 
crosses g« and q[ in T' . By the desired pg-property for each i and property (i), note that 
the path from (3 to a crosses and q\ in T, and the path from (3 to a crosses p, t and p\ in 
T'. If i G {l,2,...,n}, let S 1 = {a, f3,Pi,p'i, qi, q^, h, Ci, di}, and if % G {n + 1, n + 2, . . . , n'}, 
let S = {^^p^p'^q^q'^x^yi}. Since T\(S - {p i: p' i: q i: q'A-) = T'\(S - {pi, p' i} q i} ql}) , it now 
follows that either T\S or T'\S is not an element of <Q ni > (if i € {1,2, .. . ,n}) or <Q n > 
(Hi G {n + 1, n + 2, . . . thereby contradicting that T and T" are both in <Q>. This 

completes the proof of Claim 1. 

Claim 2. Q is compatible if and only if A' has a betweenness ordering / for C with f(m) = 
and f(M) = \A\ + 1. In particular, there is a bijection from the solutions of ip to the elements 
in <Q>. 
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First, suppose that Q is compatible. Again, let T be an unrooted binary phylogenetic 
tree in <Q>. Recall that the sets <Q Vi > and <Q Ti > both contain two a/3-caterpillars. Thus 

is isomorphic to one phylogenetic tree of <Q 7ri > for each i G {1,2, ... ,n}, and 

T\{pi,Pi,qi,qi,a,P,Xi,yi} 

is isomorphic to one phylogenetic tree of <Q n > for each % G {n + 1, n + 2, . . . , n'}. Noting 
that T is an a/3-caterpillar, we next define a betweenness ordering of A' for C. Let T* be 
T|{A' U {a, /?}}, and define / : A' — > {1, 2, . . . , such that 2 + /(%) denotes the number 
of edges on the path from a to aj in T* for each aj G A'. Since T displays for each 
i G {1, 2, . . . , n) and the path from foj to crosses q in both phylogenetic trees of <Q ni >, it 
follows that f(bi) < /(q) < f(di) or /(dj) < /(q) < /(fej). Since this holds for each 7Tj G C, 
it follows that / is a betweenness ordering of A' for C. In particular, by property (v), we 
have /(to) = and f(M) = \A\ + 1. Furthermore, by Claim 1 and the paragraph prior to 
Claim 1, each element of <Q> is mapped to a distinct betweenness ordering of A' for C 
with /(to) = and f(M) = \A\ + 1. 

Second, suppose that A' has a betweenness ordering for C, and let / be one such ordering 
with /(to) = and /(M) = \A\ + 1. Note that / imposes an ordering on each lost pair (xi, yi) 
such that either f(xj) < f{yi) or f(yi) < f{xi). Furthermore, recall that n' = \C\ + \L\ and 
\A'\ = s + 2. Let T be the unique a/3-caterpillar, whose label set is A'U{a, (3}, such that the 
path from a to aj in T contains 2 + /(%) edges for each a^- G A'. Let a be any element of A. 
Next, we describe the algorithm BuildTree that iteratively construct a series 71, T 2 , . . . , T n < 
of a/3-caterpillars. Set % to be 1. To obtain Tj from Tj_i, proceed in the following way: 

Let P = {pi,P2, ■ ■ ■ iVi-XiVxiV'^i ■ ■ ■ iPi-i\- We first define two edges ej and in T^-y. If 
the path from a to a in Tj_x crosses an element of P, let = {u, v} be the first edge on this 
path such that u is adjacent to a leaf labeled with an element of P and v is adjacent to a 
leaf labeled with an element that is not contained in P. Otherwise, choose to be the edge 
that is incident with a. Similarly, if the path from (3 to a in Tj_i crosses an element of P, 
let = {u',v'} be the first edge on this path such that u' is adjacent to a leaf labeled with 
an element of P and v' is adjacent to a leaf labeled with an element that is not contained 
in P. Otherwise, choose e- to be the edge that is incident with (5. Note that and e! i are 
uniquely defined. 

To obtain Tj from Tj_i, we consider two cases. First, if i G {1,2, ... ,n} and /(foj) < 
/(q) < /(rfj), or if i G {n + 1, n + 2, . . . , n'} and < /(yi), subdivide the edge incident 

with ct twice and join each of the two newly created vertices with a new leaf labeled Pi 
and p\, respectively, by introducing two new edges such that the path from a to p\ crosses 
Pi. Furthermore, subdivide e' i twice and join each of the two newly created vertices with a 
new leaf labeled q\ and qi, respectively, by introducing two new edges such that the path 
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Pi Pi 93 93 92 92 



e 4 



M 



qi qi P2 P2 p-i p-i 



Pi Pa Pi Pi 93 93 92 qi 



M 



qi qi 94 94 P2 P2 f>3 J>3 



Figure 7: The intermediate trees T3 (top) and T4 (bottom) that are obtained from applying the 
algorithm BuildTree to the Betweenness instance that is described in the caption of Figure [6] 
for when m < a < b < c < M is the given betweenness ordering. Note that T4 is obtained from T3 
by subdividing twice the edge that is incident with a and e^, respectively. Furthermore, the tree 
depicted in Figure [6] is obtained from T4 by applying one more iteration of BuildTree. 



from (3 to qi crosses q[. Second, if z G {1,2, ... ,n} and f(di) < /(q) < fipi), or if i £ 
{n + l,n + 2, . . . ,n'} and /(?/i) < f(%i), subdivide ti twice and join each of the two newly 
created vertices with a new leaf labeled qi and q[, respectively, by introducing two new edges 
such that the path from a to q\ crosses q^ Furthermore, subdivide the edge incident with 
/3 twice and join each of the two newly created vertices with a new leaf labeled p\ and Pi, 
respectively, by introducing two new edges such that the path from (3 to pi crosses p\. A 
specific example of the definition of and e^, respectively, and on how to obtain from 
Tj_i is shown in Figure [7} 

If i < n! , increment % by 1 and repeat; otherwise stop. In this way, we obtain a tree 
T n i that displays Q and, hence Q is compatible. In particular, T n i is an element of <Q>. 
Furthermore, again by Claim 1 and the paragraph prior to Claim 1, T n < is the unique tree 
of <Q> that has the property that T n >\(A' U {a, /?}) = T . Thus, each betweenness ordering 
/ of A 1 for C with f{m) = and f(M) — \A\ + 1 is mapped to a distinct element of <Q>. 
This completes the proof of Claim 2. 

It now follows that the presented transformation from an instance of cBETWEENNESS to 
an instance of Quartet Compatibility is an ASP-reduction that can be carried out in 
polynomial time. Hence, Quartet Compatibility is ASP-complete. This establishes the 
proof of this theorem. □ 

Now recall that ASP-completeness implies NP-completeness of the corresponding decision 
problem, say 11^ [301 Theorem 3.4]. Since Ii^ is exactly the complementary question of the 
Quartet Challenge (see last paragraph of Section [2]), the next corollary immediately 
follows. 
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Corollary 3.4. The Quartet Challenge is coNP- complete. 



4 Conclusion 

In this paper, we have shown that the two problems cBetweenness and Quartet Com- 
patibility that have applications in computational biology are ASP-complete. Thus, given 
a betweenness ordering or a phylogenetic tree that displays a set of quartets, it is computa- 
tionally hard to decide if another solution exists to a problem instance of cBetweenness 
and Quartet Compatibility, respectively. If there is another solution, then this may im- 
ply that a data set that underlies an analysis does not contain enough information to obtain 



an unambiguous result. Furthermore, by Corollary 3.4, the ASP-completeness of Quartet 
Compatibility implies that the Quartet Challenge, which is one of Mike Steel's $100 
challenges [2H], is coNP-complete. Lastly, due to [30, Theorem 3.4], regardless of how many 
solutions to an instance of cBetweenness or Quartet Compatibility are known, it is 
always NP-complete to decide whether an additional solution exists. 

Unless P=NP, the existence of efficient algorithms to exactly solve the above-mentioned 
two problems is unlikely. Nevertheless, there is a need to develop exact algorithms that solve 
small to medium sized problem instances and, most importantly, return all solutions. For ex- 
ample, it might be possible to start filling this gap by using fixed-parameter algorithms that 
have recently proven to be particularly useful to approach many questions in computational 
biology [15J. Alternatively, heuristics and polynomial-time approximation algorithms often 
provide a valuable tool to efficiently approach problem instances of larger size. While Chor 
and Sudan [7] established a geometric approach to approximate a betweenness ordering that 
satisfies at least one half of a given set of constraints, the statement of Quartet Compat- 
ibility does not directly allow for an approximation algorithm since it is a recognition-type 
problem. Nevertheless, since compatible quartet sets are rare, the goal of a related problem 
is, given a set of quartets, to reconstruct a phylogenetic tree that displays as many quartets 
as possible. This problem is known as the Maximum Quartet Consistency problem. 
Despite its NP-hardness [21 [27], several exact algorithms (e.g. see [2j [29] and references 
therein) as well as a polynomial-time approximation [20] exist. It would therefore be in- 
teresting to investigate if these algorithms can be extended in a way such that they return 
all solutions in order to analyze whether a unique phylogenetic tree displays a given set of 
compatible quartets. 

We end this paper by noting that the computational complexity of the Quartet Chal- 
lenge changes greatly if all elements in a set of quartets over n taxa have a common taxa, 
say x. By rooting each quartet at x, i.e. deleting the vertex labeled x and its incident edge 
and regarding the resulting degree-2 vertex as the root, we obtain a set S of rooted triples 
(rooted phylogenetic trees on three taxa). By applying the Build algorithm it can now be 
checked in polynomial time if S is compatible [261 Proposition 6.4.4]. Furthermore, there is 
a unique rooted phylogenetic tree on n taxa that displays S if and only if Build returns a 
rooted binary phylogenetic tree [5j Proposition 2]. If Build returns a rooted phylogenetic 
tree that is not binary, then every refinement of this tree also displays S. 
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